United States Patent ti9j 

Millar 



[11]. Patent Number: 
[45] Date of Patent: 



4,823,296 
Apr. 18, 1989 



[54] FIRST ORDER DIGITAL FILTER WITH 
CONTROLLED BOOST/TRUNCATE 
QUANTIZER 

[75] Inventor: Paul G Millar, Felixstowe, England 

[73] Assignee: British Telecommunications public 
limited company, England 

[21] Appl. No.: 856,474 

[22] Filed: Apr. 28, 1986 

[30] Foreign Application Priority Data 

Apr. 30, 1985 [GB] United Kingdom 8510969 

[51] Int CL* G06F 15/31 

[52] U.S.C1 . ...364/724.03 

[58] Field of Search . 364/745, 724, 733, 724.03 

[56] References Cited 

U.S. PATENT DOCUMENTS 

4,034,196 7/1977 Butterweck et al 364/745 

4,213,187 7/1980 Lawrence et al 364/724 

4,236,224 11/1980 Chang 364/724 



FOREIGN PATENT DOCUMENTS 

1522698 8/1978 United Kingdom . 
8001324 6/1980 World Int Prop. O. . 

OTHER PUBLICATIONS 

H. J. Butterweck, "Suppression of Parasitic Oscillations 
in Second-Order Digital Filters by Means of a Control- 
led-Rounding Arithmetic", ARCH1V FUR ELEK- 
TRONIK UND UBERTRAGUNG STECHNIK (vol. 
29, No. 9, pp. 371-374). 

Bogner et al, Introduction to Digital Filtering, John 
Wiley & Sons, 1975, pp. 173-174, 364/745. 

Primary Examiner— David H. Malzahn 
Attorney, Agent, or Firm — Nixon & Vanderhye 

[57] ABSTRACT 

A lst-order digital filter network has a quantizer (Q) 
reducing the Dumber of output bits following multipli- 
cation (in M3). The quantizer omits insignificant bits, 
adding one LSB (boosting) or not (truncating) in depen- 
dence on the value of the multiplier output (D), 
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The present invention also relates to a first-order 

FIRST ORDER DIGITAL FILTER WI TH digital filter in which, in a second aspect, there is quanti- 

CONTROLLED BOOST/TRUNCATE QUANTIZER zation means for reducing the number of bits in the 

output signal following a multiplication stage arranged 

BACKGROUND OF THE INVENTION 5 to be controlled in dependence on the value of the dif- 

The present invention concerns digital filters, and ference between the unquantized output sample and the 

more especially quantization arrangements for lst-order value of previous output sample. 

recursive filters. BRIEF DESCRIPTION OF THE DRAWINGS 

The processors used to realize practical digital filters . 

usually operate with fixed-point arithmetic of finite FIG. 1 shows a simple configuration of a first-order, 

precision. Typically, the signs and magnitudes of signals digital, unit-gain leaky, integrator network; 

are represented by 16-bit binary numbers, and the multi- FIG - 2 shows the equivalent canonic configuration of 

pliers also have a finite coefficient range of, say, 16 bits. integrator network shown in FIG. 1; 

Under these circumstances, the resulting products are FIG. 3 shows a practical configuration of the integra- 

31 bits long. This number field expansion could prove tor network shown in FIG. 2 including a quantizer in 

especially problematical in a recursive filter structure ^ output path; 

where the presence of a feedback loop containing a FIG. * shows an equivalent hypothetical configura- 

multiplier could result in the number length growing tion of toe integrator network shown in FIG. 3; and 

with each pass through the loop. To prevent number 2Q FIG. 5 shows an alternate configuration of the FIG. 

field overflow occuring the filter algorithm needs to 3 embodiment. 

incorporate a quantization routine. DETAILED DESCRIPTION OF AN 

A number of possible quantization strategies nave EXEMPLARY EMBODIMENT 
been proposed; each introduce some signal distortion, 

but some have a less severe effect on the overall perfor- 15 Tne z-domain transfer function G(z) of a digital leaky 

mance of the network than others. integrator network is usually written as 

Several quantization strategies have been researched 
in detail, but most of the reported work relates to 2nd 



G(z)=l/(l-az-l) (1) 



order networks, where the non-linear process gives rise , _ ' ' t A . . 

to self-sustaining periodic distortions known as Hunt 30 whf * e °< a<1 ' •*> 8 relates ,0 the ^gration tunc 

„ . . Prt A„ mw u, 3U constant. 

cycles . See, for example: Tr . . .. 

(1) Jackson, L B: 'An analysis of roundoff noise in If wc makc thc snbstitution 

digital filters/ D.Sc disertation, Stevens Inst of \-a(Q*a*n m 

Technology, Hoboken, NH, 1969. v 

(2) Sandberg, I W and Kaiser, J P: 'A bound on limit 35 m 1( it can 5e seen that the dc gain of ±e 
cycles m fixed-pomt implementation of digital fil- networ k is 1/a. Thus the transfer function of a unity 
ters ' IEEE Trans Audio and Electracoustics, Vol ^ leak integrator needs to include a scaling factor a. 
AU-20, June 1972, pp 110-112. This gives 

. (3) Kieburtz, R B et al: 'Control of limit cycles in 

recursive digital filters by random quantisation,' 40 H(Z)=o/(l-(i-a)Z- 1 ) (a) 
IEEE Trans Circuits and Systems, Vol CAS-24, 

June 1977, pp 291-299. =0/(1 -aZ~ l ) (b) (3) 

(4) Lawrence, V B and Lee, A E: 'Quantisation 

schemes for recursive digital filters,' in IEEE Sym- This implies a processing scheme which utilizes two 

posium on Circuits and Systems, Rome 1982. 45 multipliers Ml, M2 an adder Al and a 1-sample delay 

(5) Parker, S R: 'Limit cycjes and correlated noise in D, as shown in FIG. 1. However the processing scheme 
digital filters* in Digital Signal processing, ed J K adopted in FIG. 2 has the same transfer function, but 
Aggerwal, Western periodicals Company, 1979, pp requires only one multiplier M3. Thus, although it now 
123-129. This is of limited relevance to lst-order uses two adders A2, A3 it is considered a more efficient 
networks where the effect of quantization is usually 50 processing configuration. 

observed as a constant off-set to the output signal, FIG. 3 incorporates a quantizer Q, and is a practical 

or a latch-up situation which masks signals below a processing scheme for realizing unity gain leaky inte- 

certain level grators. It is the behavior of this realization, when using 

ciTktxA A ,w rvc TCro Tvn/pvTTTnvT 2 s complement arithmetic, that is to be discussed. 

SUMMARY OF THE INVENTION $5 For the f ollowing analysis we will assume that the 

The present invention relates to a first-order digital adders and multipliers, shown in FIG. 3, operate to full 

filter comprising, in a first aspect, an input connected precision, and that the quantized signal has the same 

via subtraction means, multiplication means and addi- precision as the input signal. We will also assume that 

tion means to an output; delay means for supplying the the negated signal at node B is obtained by inverting all 

output signal to inputs of both the subtractor and the 60 the bits of the 2's complement number as it comes from 

adder; and quantization means for reducing the number the delay element Thus, compared with a true 2's com- 

of bits in the output signal, the quantization means being plement negation, the number at B will be in error by 1 

arranged in operation to selectively either truncate or least significant bit (lsb); 1 Isb is defined here as the 

truncate and increment the output signal in dependence weight of the step size of the quantizer. It will be shown 

on the value of the signal at the output of the multiplier. 65 that the effect of this error at the output of the network 

Advantageously the quantizer is such that the output is will be minimal. However, the errors introduced by 

incremented or not according to whether the multiplier some types of quantizers can have more significant 

output is respectively positive or negative. comsequences. 
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In FIG. 3, the quantization error (eO is considered to always exhibit a dc error of around l/(2a) Isb. The 

be the difference between, the full-precision number of small input condition errors are even more severe, with 

node F and its quantized value at node G. latch-up occurring at values between (- 1/a+l - a) 

The error at node E is e, where e is e' delayed by 1 and — 1 lsb's. 
sample period. Similarly, the error at node B will be 5 Alternatively, the dc component introduced by the 
(-e- l)lsb. Thus, knowing the errors in the two feed- truncation process could be offset by randomly boost- 
back loops, it is possible to write down the total error (s) mg the quant i Z ed signal. The quantization algorithm for 
generated at node F. The units of s and e are in lsb's. this type of is ^ follows: truncate the signal, 

s=e+a(-i-e) then randomlv add either 0 or 1 lsb to ttte truncated 

$ c a e 10 value The error (e) introduced by this process will be in 

This is the error that is injected into the network each (- l+a^e^ l) lsb TOs means that the long- 

pass through the system. It is possible to analyze the *° component of e wul be approximately zero, 

behavior of the network by making reference to the Therefore, from equation 4 it can be seen that the dc 
hypothetical equivalent processing scheme shown in component of s will be around -a. Thus, substituting in 
FIG. 4. Here, it is considered that all the component equation 6, we see that the mean accumulated error at 
parts operate to infinite precision, and that this equiva- ^ out P ut of network will be of the order of - 1 lsb, 
lent configuration has an additional input signal <r at ^ this is true whatever the nature of the input signal, 
node H. It is not difficult to see that the transfer func- Hence, a network using this scheme will not suffer from 
tion between node H and the output is - Q the high-level latch-up conditions exhibited by the pre- 

vious two quantization strategies. The slight drawbacks 
H(Z)=i/(i-(i-a)Z- 1 ) (5) to the method are that a suitable random control signal 

has to be generated, and the stochastic nature of the 
which has the same form as equation I, and we have quantizer means that the short-term error on the output 
already seen that this has a dc gain of 1/a. Thus the can be much higher than the mean error, 
error at the output will have a dc value r, where 25 i n the present invention, however, a Controlled 

Boosting and Truncating quantizer is employed. De- 
tailed examination of the above quantization strategies 



r=[dc component of <r]/<r (6) 



These are the system equations which will be used to shoW „ S that out P ut f «g»al latch-up does not always oc- 
investigate the effects of the various quantization strate- 30 F ° r e3tara P le : «"P*» of a netw ° rk containing a 

truncating quantizer will decay away to zero, under 

It would of course be possible to quantize the signal "ro m pu t conditions if the starting values are positive 
by rounding, or by simple truncation. 2's complement numbers ' Similarly, if the network contains a boosing 
rounding is equivalent to adding \ lsb to the number and the output wul go to zero if the starting value 

then truncating the result. Therefore, for a rounding 35 « negative. Thus, choosing the appropriate quantization 
quantizer the error (e) at node E will be between strategy on a polarity basis could lead to an improve- 
(- £ +d)lsb and + \ lsb, where d is a positive fractional ment » out P ut accuracy. 

number. The asymmetry of the bound arises because a A sun P lc choice of truncating positive numbers and 
value that could be quantized with an error of -£ lsb boosting negative numbers would result in satisfactory 
would in fact be quantized to the next lower value, and 40 operation for zero input conditions, but could result in 
be in error by + J lsb. The value of d is equal to the off ' set m0TS for hi S h level signals, and low-level non- 
smallest change that can occur in the full precision part zcro m P ut signals being obscured. However, it has been 
of the network. It can therefore be shown that d=a. foimd tnese problems can be overcome if the polar- 
Applying these bounds, it can be seen that when the * tv °f tne signal at node D is used to control the boost/- 
input signal is changing, the dc component of e will be 45 truncate option. The control signal is easily derived at 
near zero. Therefore, under these conditions, the value p ; ^ tilc polarity is positive the quantizer should boost; 
of r will be negligible. However, in certain circum- if it is negative it should truncate. Thus, the quantizer is 
stances, particularly when the variance of the input forced to take notice of the input signal, and this pre- 
signal is small or zero, successive samples of e become vents the input signal being lost in the quantization 
correlated, and this leads to e having a dc component in 50 noise. 

the range — J+'a to +i lsb's. Thus, using equations 4 A network which uses this controlled boost/truncate 
and 6, it can be seen that an accumulated dc error in the type of quantizer behaves in a much more orderly man- 
range (— l/(2a) — a+i) and (+l/(2a)— 3/2) lsb's can ner than the one that chooses randomly. Impulse re- 
occur, resulting in a latch-up condition whereby input sponses and decay rates are predictable and do not 
signals between these levels have no effect on the out- 55 exhibit localized gross errors that increase the low fre- 
put. quency content of the quantization noise. 

With 2*s complement truncation on the other hand all Thus, of the four quantizers discussed, it would ap- 
the bits after the truncation point are ignored. This pear that the one adopting a controlled boosting or 
means that the error (e) at node E will be in the range truncating quantization strategy would give the best 
(— 1 +a^e=0)lsb. As in the rounding case, the error at 60 performance. A quantizer of this sort is easy to imple- 
node B will be (— 1— e) lsb, which, for example, will ment, and results in a stable network which does not 
range between —a and — 1 lsb. Equations 4, 5 and 6 suffer from latch-up conditions or excessive quantiza- 
also apply to this truncating quantizer. Thus the analysis tion noise. 

of the finite arithmetic effects can be performed using Since, as shown in FIG. 3, the unquantized signal F„ 
the same arguments as for the network containing the 65 at node F is equal to the sum of the unquantized signal 
rounding quantizer. However, in this case, when large D„ at node D and the just previous quantized output 
input signals are present e will have a mean dc compo- signal G«_i at node E, it follows from simple algebra 
nent of approximately —J lsb. Thus the output will that the quantization means could also be equivalently 
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arranged to be controlled in dependence on the value of 
the difference between the unquantized signal F n and 
the value of the just previous quantized output sample 
G n - 1 (as depicted in FIG. 5). 
I claim: 5 
1 A first-order digital filter comprising: 
an input node connected via subtraction means, mul- 
tiplication means, addition means and quantization 
means to an output node for accepting an input 
digital signal at said input node and providing an 10 
output digital signal at said output node which 
results from operation of said subtraction means, 
multiplication means, addition means and quantiza- 
tion means upon said input digital signal; 
said quantization means reducing the number of bits 15 
appearing at said output node relative to the num- 
ber of bits resulting from operation of said subtrac- 
tion means, multiplication means and addition 
means; and 



from to provide a resultant signal, (b) multiplying 
the resultant signal by another digital factor to 
produce a first intermediate result signal having 
more than N bits, and (c) adding said supplied 
feedback digital signal to the first intermediate 
result signal to produce a second intermediate re- 
sult signal also having more than N bits; 

quantization means for accepting said second inter- 
mediate result signal and for providing an N bit 
output signal therefrom by selectively (a) truncat- 
ing or (b) truncating and incrementing said second 
intermediate result signal under control of said first 
intermediate result signal; and 

delay means connected to receive said N-bit output 
signal and for subsequently supplying same as said 
supplied feedback digital signal to said digital pro- 
cessing means. 

5. A first-order digital filter as in claim 4.wherein said 
quantization means (a) truncates when said first inter- 



delay means for feeding back the output digital signal 20 mediate result signal has a negative polarity and (b) 



to both the subtraction means and the addition 
means; 

the quantization means being arranged in operation to 
selectively either (a) truncate or (b) truncate and 
increment the signal input thereto in dependence 25 
on the value of the output of the multiplication 
means. 

2. A filter according to claim 1 in which the quantiza- 
tion means is such that the output is incremented or not 
according to whether the output of the multiplication 30 
means is respectively positive or negative. 

3. A first-order digital filter including: 

a digital multiplication means connected via addition 
means to quantization means for accepting succes- 
sive unquantized samples F n downstream from said 35 
multiplication means and for reducing the number 
of bits in said unquantized samples F n to produce a 
quantized output sample signal G„, 

means for generating the difference between an unq- 
uantized sample F„ and the value of the just-previ- 40 
ous quantized output sample G«_i; and 

wherein said quantization means produces said quan- 
tized output sample signal in dependence on the 
value of said difference. 

4. A first-order digital filter comprising: 45 
digital processing means for accepting successive 

input digital signal samples of N bits each, (a) sub- 
tracting a supplied feedback digital signal there- 



truncates and increments when said first intermediate 
result signal has a positive polarity. 

6. A first-order digital filtering process comprising; 

(i) accepting successive input digital signal samples of 
N bits each, (a) subtracting a supplied feedback 
digital signal therefrom to provide a resultant sig- 
nal, (b) multiplying the resultant signal by another 
digital factor to produce a first intermediate result 
signal having more than N bits, and (c) adding said 
supplied feedback digital signal to the first interme- 
diate result signal to produce a second intermediate 
result signal also having more than N bits; 

(ii) accepting said second intermediate result signal 
and providing an N bit output signal therefrom by 
selectively (a) truncating or (b) truncating and 
incrementing said second intermediate result signal 
under control of said first intermediate result sig- 
nal; and 

(iii) supplying said N bit output signal as said supplied 
feedback digital signal to said digital processing 
means during the next subsequent performance of 
step (i). 

7. A first-order digital filtering process as in claim 6 
wherein step (ii) uses (a) truncation when said first inter- 
mediate result signal has a negative polarity and (b) 
truncation and incrementation when said first interme- 
diate result signal has a positive polarity. 
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